Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New txn operation #3197

Closed
wants to merge 37 commits into from
Closed

New txn operation #3197

wants to merge 37 commits into from

Conversation

srfrog
Copy link
Contributor

@srfrog srfrog commented Mar 23, 2019

This PR adds a new type of mutation block called txn{} that executes a mutation(s) depending on the result of a conditional query in the same transaction. See original issue for initial problem description: #3059

Some changes from the original issue:

  1. Subject and object uid var values are retrieved using a keyword uid instead of predicate parsing (inside angle brackets).
  2. Both subject and object can use uid vars.
  3. More than one matching uid in query will cause error (see Update at bottom).

Test mutation:

// Schema:
name: string .
email: string @index(exact) .

// Mutation:
txn {
  query {
    me(func: eq(email, "someone@gmail.com"), first: 1) {
      v as uid
    }
  }

  mutation {
    set {
      uid(v) <name> "Some One" .
      uid(v) <email> "someone@gmail.com" .
    }
  }
}

Breaking changes

  • Changes to api.NQuad:
    • Added new field SubjectVar to hold name of variable name to replace subject Uid.
    • Added new field ObjectVar to hold name of variable name to replace object Uid.
  • Changes to api.Mutation:
    • Added new field CondQuery to pass the conditional query from client.

See all commits: dgraph-io/dgo@ac2ab89

Update

Requested changes from team's code-review:

  • Return an error if the variable is matched with more than one uid for now.
  • There should not be an error if the conditional query does not match any uid.
  • Add support to use the matched variable for object as well (besides subject).
  • All filter logic is kept in the conditional query and the mutation is executed for each var request. A mutation with no vars requested is executed once.

Closes #3059


This change is Reviewable

@MichelDiz
Copy link
Contributor

I know it's in progress but I have questions Gus.

How this line uid(v) <email> "someone@gmail.com" .will look like?
It will have the angular brackets?<0x1> <email> "someone@gmail.com .

Can you add a test for multiple operation? like:

txn {
  query {
    me(func: eq(email, "someone@gmail.com")) {
      v as uid
    }
   me2(func: eq(email, "otherperson@gmail.com")) {
      F as uid
    }
  }

  mutation {
    set {
      uid(v) <friend> uid(F) .
    }
  }
}

@srfrog
Copy link
Contributor Author

srfrog commented Mar 29, 2019

How this line uid(v) "someone@gmail.com" .will look like?

The uid(v) is translated internally to <0x1>, for example. That is if v=0x1. Only the first value is used, eventually it could run multiple mutations for each of the uid's. But that was not specified for now.

Can you add a test for multiple operation?

The query is just like any regular query, so you can do whatever you want. The only requirement, for now, is that it yields a variable value. This might change as we add more features.

@srfrog srfrog marked this pull request as ready for review April 1, 2019 17:20
@srfrog srfrog requested a review from a team April 1, 2019 17:20
Copy link

@gitlw gitlw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 10 files reviewed, 3 unresolved discussions (waiting on @srfrog)


edgraph/server.go, line 400 at r2 (raw file):

		return nil, err
	}
	if req.StartTs == 0 {

It seems like this condition is impossible, since the mutation will be given a non-zero timestamp in the doMutate function. Maybe change it to an assert?


edgraph/server.go, line 470 at r2 (raw file):

	// Parse query and process
	var queryVars map[string][]string

This might be a controversial comment, and I know it already happens in some places of our code base, as reflected by the gocyclo score https://goreportcard.com/report/github.com/dgraph-io/dgraph#gocyclo .
I'm never a fan of super long functions, since it just makes it much harder for someone trying to understand the code.
Hence I would very much this newly added logic to be extracted out to a separate function.


gql/parser_mutation_test.go, line 153 at r2 (raw file):

		query {
			me(func: eq(email, "someone@gmail.com")) {
				v as uid

what would happen if the variable ends up representing more than one uids, e.g.
me(func: has(email)) {
v as uid
}

also do we support have more than one variable bindings?
query{
lucas(func: eq(email, "lucas@dgraph.io")) {
l as uid
}

gus(func: eq(email, "gus@dgraph.io")) {
g as uid
}
}

mutation {
set {
uid(l) uid(g)
}
}

Copy link
Contributor Author

@srfrog srfrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 10 files reviewed, 3 unresolved discussions (waiting on @gitlw)


edgraph/server.go, line 400 at r2 (raw file):

Previously, gitlw (Lucas Wang) wrote…

It seems like this condition is impossible, since the mutation will be given a non-zero timestamp in the doMutate function. Maybe change it to an assert?

Sounds good


edgraph/server.go, line 470 at r2 (raw file):

Previously, gitlw (Lucas Wang) wrote…

This might be a controversial comment, and I know it already happens in some places of our code base, as reflected by the gocyclo score https://goreportcard.com/report/github.com/dgraph-io/dgraph#gocyclo .
I'm never a fan of super long functions, since it just makes it much harder for someone trying to understand the code.
Hence I would very much this newly added logic to be extracted out to a separate function.

I understand, but that's turning into a code cleanup which is not part of this feature. I'll try to move out as much logic as I can.


gql/parser_mutation_test.go, line 153 at r2 (raw file):

what would happen if the variable ends up representing more than one uids

the uids get collected in the queryVars slice, see GetUids(). laster we might have logic to run multiple mutations for each uid, but that hasn't been discussed yet.

also do we support have more than one variable bindings

yes, multiple vars are supported.

Copy link
Contributor Author

@srfrog srfrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 10 files reviewed, 3 unresolved discussions (waiting on @gitlw)


edgraph/server.go, line 400 at r2 (raw file):

Previously, srfrog (Gus) wrote…

Sounds good

Done.


edgraph/server.go, line 470 at r2 (raw file):

Previously, srfrog (Gus) wrote…

I understand, but that's turning into a code cleanup which is not part of this feature. I'll try to move out as much logic as I can.

Done.


gql/parser_mutation_test.go, line 153 at r2 (raw file):

Previously, srfrog (Gus) wrote…

what would happen if the variable ends up representing more than one uids

the uids get collected in the queryVars slice, see GetUids(). laster we might have logic to run multiple mutations for each uid, but that hasn't been discussed yet.

also do we support have more than one variable bindings

yes, multiple vars are supported.

Done.

@srfrog srfrog removed the in progress label Apr 4, 2019
Copy link

@gitlw gitlw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 10 files reviewed, 8 unresolved discussions (waiting on @gitlw and @srfrog)


chunker/rdf/parse.go, line 188 at r3 (raw file):

		rnq.ObjectValue = &api.Value{Val: &api.Value_DefaultVal{DefaultVal: oval}}
	}
	if (len(rnq.Subject) == 0 && rnq.VarName == nil) || len(rnq.Predicate) == 0 {

It seems we should not add the logical AND of checking var name here,
because even when the VarName is not nil, the Subject should still not be empty, right?


chunker/rdf/parse.go, line 194 at r3 (raw file):

		return rnq, x.Errorf("No Object in NQuad. Input: [%s]", line)
	}
	if (!sane(rnq.Subject) && rnq.VarName == nil) || !sane(rnq.Predicate) ||

Same argument as above, even when the VarName is not nil, the Subject should still be sane.


edgraph/server.go, line 427 at r3 (raw file):

	if len(queryVars) == 0 {
		// TODO: remove this error when mutation conditional expressions are added.
		return nil, x.Errorf("No variables defined in conditional query: %q", req.Query)

As we discussed, there should not be an error when the conditional query does not match any uid.


gql/parser.go, line 589 at r3 (raw file):

	if len(defines) > len(needs) {
		// return x.Errorf("Some variables are defined but not used\nDefined:%v\nUsed:%v\n",

Since a regular query parsing also uses this code block, this change would affect those as well.
For now, maybe put a comment explaining that the defined variables for a conditional query are not used in the query block, but the mutation block below.


gql/parser_mutation.go, line 67 at r3 (raw file):

		if item.Typ == itemRightCurl {
			// mutations must be enclosed in a single block.
			if !inTxn && it.Next() && it.Item().Typ != lex.ItemEOF {

Adding this extra condition for reporting error would make the parser accept input like
txn {
query{
...}
mutation {
...}
some unexpected input
}
and that's not good.

Copy link

@gitlw gitlw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 10 files reviewed, 13 unresolved discussions (waiting on @gitlw and @srfrog)


gql/parser_mutation.go, line 86 at r3 (raw file):

func parseMutationCondQuery(it *lex.ItemIterator) (string, error) {
	var query string
	var parse bool

maybe rename this var to "hasQuery"?


gql/parser_mutation.go, line 92 at r3 (raw file):

		case itemLeftCurl:
			continue
		case itemMutationContent:

I think this should be renamed to itemCondQueryOrMutationContent


gql/parser_mutation_test.go, line 291 at r3 (raw file):

	_, err := ParseMutation(m)
	require.Error(t, err)
	require.Contains(t, err.Error(), `Invalid character '}' inside query text`)

The error msg "Invalid character '}' inside query text" does not seem to be very helpful in finding out where the parsing error happens. Can you please double check the whole error msg and make sure the line and column numbers work correctly?


gql/state.go, line 437 at r3 (raw file):

		switch word {
		case "query":
			l.Emit(itemMutationTxnOp)

Do you feel it's better to have separate item values to represent the query and mutation inside a txn?


gql/state.go, line 475 at r3 (raw file):

}

func lexTextQuery(l *lex.Lexer) lex.StateFn {

Maybe rename to lexCondQuery?

srfrog added 4 commits April 4, 2019 15:05
…ts queries

The function checkDependency() checks that variables defined in a query are used, and
that used variables are actually defined. Well this breaks conditional mutations since
we use a query to define variables that are used not in the same query block, but in
a separate mutation block. Nonetheless, the user must be informed of this so a warning
output is generated.
Copy link
Contributor Author

@srfrog srfrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 10 files reviewed, 13 unresolved discussions (waiting on @gitlw and @srfrog)


chunker/rdf/parse.go, line 188 at r3 (raw file):

Previously, gitlw (Lucas Wang) wrote…

It seems we should not add the logical AND of checking var name here,
because even when the VarName is not nil, the Subject should still not be empty, right?

There won't be a subject because there's no <> parsed, it uses a keyword, uid.

See: https://github.com/dgraph-io/dgraph/blob/srfrog/issue-3059_new_txn_operation/chunker/rdf/state.go#L79

But I think we need to make rnq.SubjectVarName and rnq.ObjectVarName instead. So if rnq.Subject and rnq.SubjectVarName are empty, then that's an error. Same with object.


chunker/rdf/parse.go, line 194 at r3 (raw file):

Previously, gitlw (Lucas Wang) wrote…

Same argument as above, even when the VarName is not nil, the Subject should still be sane.

Like above, subject content never gets parsed.


gql/parser_mutation.go, line 67 at r3 (raw file):

Previously, gitlw (Lucas Wang) wrote…

Adding this extra condition for reporting error would make the parser accept input like
txn {
query{
...}
mutation {
...}
some unexpected input
}
and that's not good.

That is true for txn but not for regular mutations. The side effect is that anything after the mutation's last } is ignored as comment because parsing has stopped.

I will work on this some more when other things are solid. This means changing the lexer which could mean updating to all the parsing.


gql/parser_mutation.go, line 86 at r3 (raw file):

Previously, gitlw (Lucas Wang) wrote…

maybe rename this var to "hasQuery"?

all the parsing functions have a signature parse... like parseMutationOp below this one, I'm just following that.


gql/parser_mutation.go, line 92 at r3 (raw file):

Previously, gitlw (Lucas Wang) wrote…

I think this should be renamed to itemCondQueryOrMutationContent

that sounds like it could break some law of thermodynamics.
how about itemMutationOpContent ? since query and mutation are operators.


gql/state.go, line 437 at r3 (raw file):

Previously, gitlw (Lucas Wang) wrote…

Do you feel it's better to have separate item values to represent the query and mutation inside a txn?

yes, they are orthogonal values. there is really nothing preventing us from adding a new schema{} block in there, for example.


gql/state.go, line 475 at r3 (raw file):

Previously, gitlw (Lucas Wang) wrote…

Maybe rename to lexCondQuery?

it can be lexTextCondQuery we need text part to indicate what we emit.

Copy link
Contributor Author

@srfrog srfrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 10 files reviewed, 13 unresolved discussions (waiting on @gitlw and @srfrog)


edgraph/server.go, line 427 at r3 (raw file):

Previously, gitlw (Lucas Wang) wrote…

As we discussed, there should not be an error when the conditional query does not match any uid.

Done.


gql/parser.go, line 589 at r3 (raw file):

Previously, gitlw (Lucas Wang) wrote…

Since a regular query parsing also uses this code block, this change would affect those as well.
For now, maybe put a comment explaining that the defined variables for a conditional query are not used in the query block, but the mutation block below.

Done.

Copy link
Contributor Author

@srfrog srfrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 10 files reviewed, 13 unresolved discussions (waiting on @gitlw)


gql/parser_mutation_test.go, line 291 at r3 (raw file):

Previously, gitlw (Lucas Wang) wrote…

The error msg "Invalid character '}' inside query text" does not seem to be very helpful in finding out where the parsing error happens. Can you please double check the whole error msg and make sure the line and column numbers work correctly?

Changed error and now it reads:

...at line 3 column 8: Unbalanced '}' found inside query text" inside of txn block."

Correctly indicating the beginning of the query block and that the brace is unbalanced.

srfrog added 4 commits April 4, 2019 17:43
… subject vars

Update the API change for subject vars to api.NQuad.SubjectVar.
Add support for object vars using api.NQuad.ObjectVar.

If n-quad subject value and var name are empty this is an error.
If n-quad object id and var name are empty this is also an error.
@srfrog srfrog requested a review from manishrjain April 9, 2019 17:31
Copy link
Contributor

@manishrjain manishrjain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 10 files at r1, 2 of 5 files at r2, 8 of 8 files at r4.
Reviewable status: all files reviewed, 16 unresolved discussions (waiting on @gitlw and @srfrog)


chunker/rdf/parse.go, line 84 at r4 (raw file):

			switch item.Typ {
			case itemSubjectVarName:
				rnq.SubjectVar = item.Val

Try to get rid of NQuad changes.


edgraph/server.go, line 389 at r4 (raw file):

// doCondQuery processes a conditional query within the same transaction of a mutation.
// Returns a map with the vars created from the query, otherwise nil and an error.
func doCondQuery(ctx context.Context, l *query.Latency, req *api.Request,

Can share a lot of logic with the Query func. So, refactor that out in another func and use it in both Query and txn mutation.


gql/parser_mutation.go, line 48 at r4 (raw file):

		var err error
		// Get the query text: txn{ query { ... }}
		mu.CondQuery, err = parseMutationCondQuery(it)

mu.Query, err = parseMutationQuery(it)

@mangalaman93 mangalaman93 force-pushed the srfrog/issue-3059_new_txn_operation branch 2 times, most recently from 5dac02f to 5421819 Compare May 6, 2019 14:16
@mangalaman93 mangalaman93 self-assigned this May 13, 2019
@martinmr martinmr closed this May 14, 2019
@mangalaman93 mangalaman93 mentioned this pull request May 15, 2019
21 tasks
@joshua-goldstein joshua-goldstein deleted the srfrog/issue-3059_new_txn_operation branch August 11, 2022 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Support a new Upsert operation
6 participants